Genome-­wide scan of pulmonary phenotypes on local ancestry ⟶ genes interacting with smoking

Andrey Ziyatdinov, PostDoctoral Fellow at HSPH

September 27, 2017
Statistical Genetics Meeting
Channing Division of Network Medicine

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

Gene-environment interaction scans

↑ Sample Size ⟶ ↑ Power for GWAS

But this strategy is generally not successful for interaction GWAS

  • Example: Genome-wide scan of FEV1/FEV1-FVC on 50,008 individuals from UK Biobank (Wain et al. 2015) reveals 0 SNP-smoking interactions (p < 5e-08)

One of the alternative strategies is to aggregate genetic variants:

  1. Grouping into Genetic Risk Scores (GRS) (Aschard et al. 2017)
  2. Using the ancestry information (Aschard et al. 2015), (Park et al. 2016)

Genome of recently admixed individuals

Leaveraging ancestry information

COPDGene African Americans dataset

  • 3.3K African Americans from the COPDGene project
  • 7 quantitative & 1 binary outcomes
    • FEV1/FEV1pp, FVC/FVCpp, FEV1_FVC, pctEmph_Slicer, TLCpp, finalGold
  • binary exposures
    • SmokCigNow, CigPerDayNow > 15, CompletedSchool > 2

37K long (>10kb) local ancestry segments (Parker et al. 2014)

Project goals

  1. Prove: local ancestry → gene-environment interactions
  2. Follow-up the ancestry-based findings
    • Fine-mapping on SNPs
    • Enrichment analysis
    • Check Gene Expression and Methylation

The COPDGene dataset is appropriate, as the proportion of African ancestry is associated with the risk of COPD (Kumar et al. 2010)

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

First association model

  • marginal effect: \(y \sim a_g + a_l\)
  • interaction effect: \(y \sim a_g + a_l + x_e + a_g * x_e + a_l * x_e\)

Confounding factors other than global ancestry \(a_g\):

  • trait-sepcific covariates, e.g.
    • FEV1 ~ Age + Age^2 + Gender + Height + PackYears + SmokCigNow
  • random effect on medical centers (\(\approx\) 5% of variance)
    • random effect of medical device for pctEmph_Slicer trait

This (marginal) model was used in (Parker et al. 2014).
Is it OK for interaction?

First QQ plots: marginal & interaction

Our approach to fix model misspecification

  1. Ancestry Relatedness Matrix (ARM) (Zaitlen et al. 2014)
  2. Another (EARM) for ancestry-exposure component (Sul et al. 2016)
  3. Modeling heteroskedasticity (Don’t depreciate exploratory plots!)
  4. Selection of smoking covariates
    • SmokCigNow + ATS_PackYearsDuration_Smoking + log_CigPerDaySmokAvg + SmokCigNow + SmokCigNow0_15 + SmokCigarNow

More details in our previous talk COPDGene African-Americans & QQ plots

Heteroskedasticity & Covariance Matrices

Results: Clean QQ plots (marginal)

Results: Nearly Clean QQ plots (interaction)

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

Ancestry-SmokCigNow (7 traits)

Zoom in Chr 11 (2 repetetive traits)

Multi-trait test (5 traits)

\(z = [z_1; z_2; \dots]^T \sim N(0, \Sigma)\)
under the null hypothesis

  • Estimate covariance pairs
    • truncated normal
    • threshold 2.5
  • Apply the Omnibus test

Results: Omnibus test (5 traits)

Results: Top Genes

Bonferroni 0.05 / 37K = 1.4e-06


Ancestry segment: 11:12,332,105 - 12,394,102
Genes within \(\pm\) 100kb: PARVA, MICAL2, MICALCL, RASSF10, TEAD1
Trait Exposure z-score p-value
FEV1pp SmokCigNow 4.5 7.3e-06
Omnibus SmokCigNow 5.7e-05
FEV1_FVC SmokCigNow 3.9 1.1e-04
FEV1 SmokCigNow 3.8 1.4e-04

Ancestry segment: 2:238,819,792 - 238,904,351
Genes within \(\pm\) 100kb: TWIST2, HDAC4, MIR4440, MIR4441

Trait Exposure z-score p-value
FEV1 SmokCigNow0_15 4.2 2.6e-05
FEV1pp SmokCigNow0_15 4.1 4.9e-05
FVC SmokCigNow0_15 4.0 6.9e-05
Omnibus SmokCigNow_15 1.5e-04
FVCpp SmokCigNow0_15 3.7 2.5e-04

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis (ongoing work)

The effective number of tests

How to perform enrichment analysis?

  1. Gene-set enrichment analysis (GSEA)
    • As simple as the Enrichr web tool
  2. Functional or tissue-specific enrichment
    • SNP-level resolution is requred


Data Min size Mean size Genome coverage
Local ancestry 10kb 13kp 74%
ENCODE annotation 0.150kb 10%
Intersection 10kb 11kb 70%

What type of interactions do we detect?

Leaveraging Methylation data

We observed top associated genes have been published to be associated with epigenetic changes.

Hypothesis: the mechanism of ancestry-based association is:
Smoking → Up/Down Methylation → COPD-related phenotype

Wan et al., Smoking-associated site-specific differential methylation in buccal mucosa in the COPDGene study (2015)

References

Aschard et al. 2015. “Leveraging local ancestry to detect gene-gene interactions in genome-wide data.” BMC Genetics 16 (1). BMC Genetics: 124. doi:10.1186/s12863-015-0283-z.

———. 2017. “Evidence for large-scale gene-by-smoking interaction effects on pulmonary function.” International Journal of Epidemiology 46 (3): 894–904. doi:10.1093/ije/dyw318.

Kumar et al. 2010. “Genetic Ancestry in Lung-Function Predictions.” New England Journal of Medicine 363 (4): 321–30. doi:10.1056/NEJMoa0907897.

Park et al. 2016. “An Ancestry Based Approach for Detecting Interactions.”

Parker et al. 2014. “Admixture mapping identifies a quantitative trait locus associated with FEV1/FVC in the COPDGene Study.” Genetic Epidemiology 38 (7): 652–59. doi:10.1002/gepi.21847.

Renier et al. 2017. “HHS Public Access” 165 (7): 1789–1802. doi:10.1016/j.cell.2016.05.007.Mapping.

Sul et al. 2016. “Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.” PLoS Genetics 12 (3): e1005849. doi:10.1371/journal.pgen.1005849.

Wain et al. 2015. “Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): A genetic association study in UK Biobank.” The Lancet Respiratory Medicine 3 (10): 769–81. doi:10.1016/S2213-2600(15)00283-0.

Zaitlen et al. 2014. “Leveraging population admixture to characterize the heritability of complex traits.” Nature Genetics 46 (12). Nature Publishing Group: 1356–62. doi:10.1038/ng.3139.